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Top tagging is a recent approach to identifying boosted hadronic top quarks. It avoids recon- 
structing individual top decay products and instead uses a jet algorithm to reconstruct the entire 
top decay. Quite generally, geometrically large jets including heavy particles (fat jets) can be an- 
alyzed on the level of their subjet constituents. LHC data will soon allow us to establish this new 
analysis method. We discuss different tagging algorithms, their critical QCD aspects, and currently 
available experimental results. For the development of taggers and their different applications this 
review should provide a firm theoretical and algorithmic background. 
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I. PHYSICS CASE 

The top quark [I] is the only observed fermion with a weak-scale mass and the only quark which decays 
before it hadronizes [2 . At the LHC it should be a perfect laboratory to study the mechanism of electroweak 
symmetry breaking, orthogonal to searches for a scalar Higgs boson. Because the top quark also induces the 
largest quantum corrections to the bare Higgs mass it lives in the center of all new physics models motivated 
by the hierarchy problem [3]. Many such extensions of the Standard Model, like supersymmetry or little 
Higgs models, predict top partners which naturally decay to top quarks. Other models, like extra dimensions 
with geometries linked to the mass hierarchy in the Standard Model, predict large couplings of new states 
to top quarks, again giving rise to new physics decays to top quarks. 

The problem is that in a high- multiplicity QCD environment the reconstruction of top quarks is difficult. 
Obviously, two leptonically decaying top quarks can be observed well through their decay leptons and missing 
transverse energy. Because of the two invisible neutrinos a full reconstruction of leptonic top pairs is not 
possible. A semi-leptonically decaying top pair can be reconstructed approximately. For this we replace 
the missing 4-momentum of the neutrino with the assumption that we know its mass, the two-dimensional 
measured missing transverse momentum vector, and at least one of the two W and top on-shell mass 
constraints. The problem is that these assumptions make it hard to disentangle Standard Model top quarks 
for example from top partners decaying to a top quark and missing energy. Finally, purely hadronically 
decaying top pairs produce six decay jets with two 6-tags plus any number of QCD jets. The mass scale of 
QCD radiation and the W decay mass scale m^//2 are very similar. Reliably reconstructing such a top pair 
event will be a serious challenge at the LHC. This list of channels and their individual challenges supports 
the claim that top identification is one of the hardest tasks in LHC experiments. This is the reason why to 
date there exist essentially no published search limits of top partners [I]. On the other hand, following theory 
arguments we really want to study top pairs in signal or background events, including a full reconstruction. 
This is only possible for hadronic top decays. 

Two main motivations to study such hadronic top quarks in a boosted regime exist, one theoretical and 
one experimental. From a theory perspective heavy s-channel resonances preferably decaying to top quarks 
will produce highly boosted top quarks. They require a dedicated analysis strategy because the large boost 
makes traditional analysis methods hard [H |6] . Similarly, models with top partners and a weakly interacting 
dark matter agent naturally predict scenarios where top partners are pair-produced at the LHC and then 
decay into a top quark pair plus missing energy [7j . The higher the typical new physics mass scale is pushed 
by other LHC searches, the more boosted also these top quarks will become. 

Secondly, a very general experimental concern with LHC events involving top quarks is the high jet 
multiplicity. The associated combinatorial backgrounds are a problem for many LHC searches; the best 
known victim of such combinatorics is the Higgs search based on tiH events with hadronic H — >■ bb decays [HI 
[TO] . An identification algorithm for hadronic top quarks which automatically takes care of jet combinatorics 
should have the potential to revive a sizeable number of LHC analyses. One phase space region where this 
seems feasible are boosted top quarks, because the different top decay jets and QCD radiation are well 
separated from one another. 

The idea of studying the substructure of jets is already a classic [TTJ. It is closely tied to the development of 
recombination jet algorithms. Its impact in solving the two problems laid out above has only been recognized 
recently. The LHC will for the first time produce enough boosted top quarks (or other heavy Standard Model 
particles) to systematically study these analysis ideas. As an illustration, in the left panel of Fig. [T]we show 
the geometric size of a top decay as a function of the transverse momentum of the decaying top. Values of 
ARbjj ~ 7r imply that in the transverse plane the top decay fills in the central region an entire hemisphere. 
In the right panel of Fig. [l] we show the number of top quarks we can tag inside a fat jet 1 of size R = 1.5. 
The minimum transverse momentum we can probe is around 200 GeV. At the LHC, this corresponds to 



The name fat jet refers to geometrically large jet which include a heavy particle decay. 
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Figure 1: Approximate C/A distance of all three top decay products for Standard Model tt events as a function of 
Pr,t- The left panel shows all events while the right panel shows only tagged events using C/A fat jets of size R— 1.5. 
Figure from Ref. [8]. 



0(5— 10%) of all top pairs. If we start with 60000 top pairs in the currently available 5 fb _1 sample, several 
thousand of them should have a sufficient boost, so we can test top taggers on them. The recent Fast Jet 
update [12] allows larger jet sizes, so increasing the fat jet size to R = 1.8 can double the number of available 
top quarks in ATLAS and CMS. Now is the right time for detailed studies of top taggers in particular and 
subjet methods in general on LHC data. 

In this review we summarize the recent developments on top taggers. This will not include a comprehensive 
comparison of the performance of different taggers, which has to be left to data analyses. Neither do we 
spend much effort describing possible analysis ideas which fat jet analyses will eventually allow. The task is 
simply to describe different approaches to top tagging, including a wide variety of algorithms with individual 
advantages and challenges. For more detailed comparisons of the performance of different taggers we refer 
to the frequent BOOST proceedings [T31 [T3] which present the state of the art. What we provide are the 
underlying concepts and algorithms. 

As an audience for this review we envision graduate students or researchers who would like to enter this 
field or who are looking for a comprehensive discussion of the relevant ideas and their realizations. We 
limit ourselves to top tagging only and neglect for example (most) Higgs taggers or other related tools. 
For alternative reviews about jet substructure we refer for example to Refs |13H16| . Going beyond the top 
tagging focus would have made this review considerably less coherent, and as far as we can tell the concepts 
used and realized in top taggers cover the entire substructure field well. 

In its current form we recommend reading through each of the main parts of this review from the beginning 
to the end. The discussions of the taggers as well as of the QCD effects have a pedagogical setup, and even 
though the individual sections are self contained enough to maybe serve as a reference book, some information 
will always be hidden in introductory discussions. 

This review consists of three major parts: first, in Sec.|n]we will introduce all major top tagging algorithms. 
Major in this context is defined as 'likely to be tested on LHC data soon'. All described taggers include a set 
of tunable parameters, however, explicit numbers for kinematic cuts are only illustrations and should be taken 
with a grain of salt. On the other hand, the different algorithms have very different physics backgrounds, 



so it is useful to discuss each of them in some detail. In the second part III we discuss the related QCD 
issues, most notably different ways of dealing with soft QCD effects, underlying event, and pile-up. The 
different approaches are historically developed for individual taggers and often have major impact on their 
performance. On the other hand, as far as they contribute to the definition of the physics objects of the 
actual top tagging algorithms they can be relatively easily exchanged between and added to taggers. Finally, 



in Sec. IV we discuss some of the published experimental results. This part will suffer from the fact that 
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most ATLAS and CMS results on top tagging are at best published in internal notes, so we will be very 
brief. 



II. TOP TAGGING ALGORITHMS 

Top tagging algorithms are typically based on two classes of observables. On the one hand, we can 
generalize the well established event shapes to jet shapes, i.e. observables defined on calorimeter clusters 
of the energy flow inside a geometrically large fat jet. Such jet shapes are directly accessible by the LHC 
detectors. For our purpose the most relevant jet shape is the jet mass, on which all top tagging algorithms 
are based. The second class of observables is the clustering history of all jet constituents. This history cannot 
be observed directly. Instead, we have to rely on our understanding of QCD to simulate it, based on the 
energy depositions we observe in the calorimeters (and trackers). 

To backwards engineer the splitting history of a jet we can use our picture of collinear quark and gluon 
splittings predicted by first principles QCD. The successive splitting of quarks and gluons radiated off an 
n-particle hard process (o n ) factorizes in the soft or collinear limits into the simple form 

Cn+i = / o n -^-dz ~- P- -(z) , (1) 
J P 3 *k 

where pj is the momentum of the splitting parton and z is the energy fraction of one of the splitting products 
j — > jij*2- The different splitting kernels P{z) depend on the partonic quark or gluon process and are known. 
They often diverge in the soft limit z — > 0, so we will encounter an overlapping enhancement and eventually 
divergence for soft and for collinear radiation [T7l [18]. The factorization shown in Eq.Q describes the 
splitting of parton radiation off incoming as well as off outgoing hard partons until the radiated partons 
become soft enough to hadronize. The numerical implementation of Eq.dTJ is the parton shower, and it 
describes the transition from hard partons to a large number of hadrons which eventually decay and appear 
in the calorimeters of the LHC experiments. 

Inverting this successive splitting and hence extracting a hard parton momentum from a measured jet is 
what jet algorithms do. Historically, an important issue is the infrared safety of observables and algorithms; 
a soft or collinear splitting of any parton momentum cannot impact the macroscopic observables. While 
some cone algorithms are not collinear save, recombination algorithms are. Such recombination algorithms 
iteratively determine which of the observed calorimeter towers should be merged into subjets and which of 
these subjets should then be merged together step by step, such that finally we arrive at few hard jets per 
event. The end of this successive splitting can be defined in terms of a given minimum jet separation or a 
given maximum number of jets. Different recombination algorithms are based on different subjet distance 
measures: 

d n B =Pr,j 1 

VhB = 1 

dnB = -J- ■ (2) 

These measures can be generalized to dj 1 j 2 — AR^^/D 2 x min^y^pff 1 - ) for n = —1,0,1. The kx- 
algorithm [TU] mimics the soft and collinear enhancement of the QCD splitting kernels in Eq.Q. For the 
top tagging application it should best reconstruct the QCD splitting history. The Cambridge/ Aachen (C/A) 
algorithm :20] always combines the two closest (most collinear) subjets. It is sensitive to collinear but not to 
soft splittings, but as we will see later it has some advantages in fat jet searches. The anti-fey [H] algorithm 
first combines the hardest subjets, to define a particularly stable jet recombination with clean geometric jet 



AR 

k T d hh = — min (p 2 Tjl , V \ j2 ) 

AR 2 

Cambridge/ Aachen dj ± j 2 — — ^ n 



anti-fe T d jlh = mm , 

u \ p T,ji Pt, 32 
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boundaries. Intermediate subjets based on the anti-/cT-algorithm have not resemblance with what we would 
expect from QCD. All three algorithms are available through the Fast Jet package |12j . 

Closely related to the kx and C/A measures is the JADE distance [22 which essentially is a transverse 
sub jet mass: 

d hh = Pr,hPT,h &R 2 jlj2 ~ m rj in ■ (3) 

In this notation we label the splitting partons as well as the reconstructed subjets in the recombination 
algorithms as j'j. In the remainder of the paper we will only use subjets, so this notation does not pose 
any problems. Moreover, we will refer to all intermediate clusterings inside all recombination algorithms as 



subjets. More stable objects, like filtered subjets we will introduce in Sec. Ill 



Independent of the choice of subjets to be merged by the jet algorithm we also have to define a scheme for 
the combination of the two 4-momenta. In particular when looking for massive jets we should not assume 
anything about the mass of the partons. Instead, we can simply add the two 4- vectors pj = pj x + pj 2 in the 
E-scheme. The subjet mass is defined as = pj. In most (soft or collinear) QCD splittings it should not 
exceed the B meson mass, and even including detector effects we usually find nij < 30 GeV in the absence 
of massive weak-scale splittings. 

In contrast to the dynamic clustering history which we can think of as a time evolution, jet shapes are 
observables based on the final jet constituents. A priory, it is not clear that these two approaches include 
the same information. Therefore, the comparison of different top taggers is first of all an interesting QCD 
experiment. 

Because different top taggers rely on very different jet shapes we will not introduce them in general here. 
The definitions are often inherited from event shapes, like most noticeably thrust [23]. Unlike jet clustering 
histories, which depending on the underlying jet algorithm are either theoretically well defined (i.e. infrared 
save) or not, jet shapes have to be classified one by one. Much work has for example gone into appropriate 
definitions of the jet mass, introduced above [24] . 

The kinematics underlying this jet mass, assuming widely separated jets with a good 4-momentum recon- 
struction, is fairly simple. Following our QCD picture, it is based on successive (1 — > 2) splittings. If one 
of these splittings corresponds to the t — > Wb, W — > jj, or even H — > bb decay, the corresponding jet mass 
should be around the electroweak scale. In the leading logarithmic approximation we can describe a massive 
jet composed out of two subjets using [25] 

^-~z(l-z)ARl h with z (4) 
Pi.,, PT,j 



As mentioned above, all top taggers include at least one jet shape, namely the jet mass. The early subjet 
tools combine the jet mass with a clustering history. This includes the first W and top taggers by Mike 
Seymour (Sec. II A), the W predecessor to YSplitter (Sec. II A) and the BDRS Hi ggs ta gger (Sec. II A). 

More advanced tools like YSplitter (Sec. II B), the Seattle or pruning tagger (Sec. II C), the Johns Hopkins 
tagger (Sec. IID), the HEPTopTagger (Sec. HE I, or the Thaler- Wang tagger (Sec. IIF) supplement the jet 



mass with a detailed analysis of the clustering history. Differences between them arise because of different 
jet algorithms and different selection criteria to extract the massive t — > Wb and W — > jj splittings. 

Following the success of event shapes at LEP, the iV-subjettiness tagger (Sec. II G I, the template method 



(Sec. II H), or the tree-less algorithm (Sec. II Hi are exclusively based on (sub-)jct shapes. The choice of jet 
algorithms in this approach does not play any role, except for removing underlying event and pile-up, as we 
will discuss in Sec. IHII 



Testing which family of taggers is best suited for studies of the inside of jets will shed light on experimental 
QCD issues way beyond the identification of top jets. For example at LEP, event shapes became the standard 
tools for any kind of precision QCD measurements, like for example the a s measurement. At the Tevatron, 
simple cone jets were used most of the time because they were deemed to be most stable in the hadron 
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collider environment. At the LHC we are already observing serious problems with pile-up, even though the 
collider energy and luminosity are still small. In this massively complex QCD environment it is not at all 
clear how we will analyze QCD effects in the coming years. Ongoing jet substructure and top tagging studies 
will significantly contribute to answering this very fundamental question. 



A. Early developments 



The first tagger for W bosons and top quarks was developed in 1994 11 j. According to its author it 
was meant to illustrate the power of new jet algorithms and to replace the at that time common cone 
algorithms with recombination algorithms. The basic idea is that step-wise combining calorimeter towers 
to jets by following geometric and energy flow patterns includes more information than simply collecting 
all towers within a certain R distance. Very early clustering algorithms were developed and used by the 
JADE collaboration [22], based on the distance measure Eq.((3| which unfortunately does not reflect the 
soft-collinear splitting kernels which we can derive from QCD~17 . A more appropriate clustering history 
includes valuable information on the content and on the origin of a jet. Note that such an additional source of 
information bypasses for example the definitions of optimal use of information using matrix element methods, 
because it acts on the objects of our usual analyses, not on correlations between known objects. 

The original IF-tagger [TT] is based on the fey-clustering algorithm. It starts with a fat jet of size R = 1.0, 
extracts the two hardest subjets, and then cuts on the R distance between the two subjets Rj lt j 2 > 0-25 and 
between the fat jet and each of the subjets Rj,j t < 0.81. Subjets with an energy below 17 GeV are excluded 
from the W reconstruction. The two subjets finally have to reconstruct the W mass to ±10 GeV. As we will 
see later, this strategy defines most relevant ingredients of the corresponding modern Higgs and top taggers. 

For comparison, a similar tagger based on a cone algorithm |llj uses the same two cone radii of 0.81 
and 0.25 to look for a large jet which includes two well separated smaller jets with transverse energy above 
20 GeV. Those two subjets also have to reconstruct raw within ±10 GeV. In the left panel of Fig. [2] the 
dashed and dot-dashed curves show the results from the clustering and cone algorithms. The clustering 
algorithm gives a significantly narrower distribution centered well around the true m\y = 80.4 GeV. 

Serious differences in the performance of these two taggers arise once we include hadronization and under- 
lying event. Hadronization will clearly be better described by the clustering algorithm. On the other hand, 
underlying event and eventually pile-up will compromise the clustering result because soft QCD activity 
inside the fat jet will eventually be included in the jet mass, cut off only by the global minimum energy 
requirement for example from individual detector cells. The double cone algorithm will ignore a large part 
of the original large cone when it reconstructs the W mass from the two small cones. A drop by a factor 




Figure 2: Reconstructed W mass (left) and top mass (right) from cluster (solid and dashed) and cone (dotted and 
dot-dashed) algorithms. The solid and dotted curves include underlying event. For the top mass we also show the 
mass reconstructed from partons instead of subjets as the double-dot-dashed line. Figures from Ref. 
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0.81/0.25 ~ 3.24 in the jet radius corresponds to an order of magnitude in area, which is the relevant measure 
for any kind of uncorrelated hadronic activity. 



Based on this observation, in Sec. Ill we introduce different ways of removing underlying event and pile-up 
from recombination algorithms. The main idea pj] is to re-cluster all particles which we find inside the 
R = 1.0 fat jet with an adapted smaller size, e.g. R — 0.6 x Rj ± j 2 , and reconstruct the W mass from these 
cleaner subjets. The solid and dotted curves in the left panel of Fig. [2] again show the raw distributions for 
both algorithms, including underlying event and cleaned subjets for the mass reconstruction. The advantage 
of the clustering algorithm is still visible, but it is significantly reduced. For the remaining discussions in this 
review that implies that the power of subjet techniques at the LHC will be largely decided by our shielding 
of the clustering history from uncorrelated hadronic activity. 



The top tagger presented in the same work is optimized for a top mass 'less than about 200 GeV in 
semi-leptonically decaying top pairs at the Tevatron (before they were actually discovered in 1995). The 
reconstructed lepton guarantees the triggering of the event. Additional ^-tagging is not employed, so we 
should see four top decay jets plus any number of QCD radiation jets per event. Three mass combinations 
Trijjj, rrijj and m^j should reproduce m\y and m t , so a \ 2 test is well suited to solve the combinatorics in 
the decay jet assignment. The minimum px value for subjets to be considered is 15 GeV. 

In the right panel of Fig. [2] we show the top mass distributions for both jet algorithms with and without 
underlying event. The true value mt = 150 GeV can be seen in the parton-level reconstruction (double- 
dot-dashed). The cluster algorithm in the presence of underlying event at the Tevatron should reconstruct 
the top mass to better than 10%. Compared to newer developments in this direction there are two aspects 
missing: first, there is no information on the reconstruction of the entire 4-momentum of the heavy particles; 
second, there is no explicit mentioning of the advantageous boosted phase space regions as compared to the 
overwhelming QCD backgrounds. Nevertheless, many of the taggers discussed later in this section closely 
follow the ideas of Ref. [TT] . 

A second early study on the use of taggers for heavy particles in the Standard Model applies a TV-tagger 
to high energy WW scattering at a 14 TeV LHC i.e. the regime where in the absence of a fundamental 
Higgs scalar the Standard Model description breaks down because it breaks unitarity. Of a semi-leptonically 
decaying W pair we again use the decay lepton to guarantee triggering. Two forward tagging jets [27] are 
not part of the central event, so we can ignore them in the reconstruction of the WW system. The analysis 
is driven towards subjet techniques because for pr,w > 320 GeV the standard fcr-algorithm in the vast 
majority of all events (98%) cannot resolve the two W decay jets. 

This analysis first studies the scale y at which the /cy-algorithm clusters subjets. In a slightly modified 
way from Eq.([2| a dimensionless separation measure can be defined as 

Vhh = 2 (1 - cos e jlh ) ■" . (5) 

j 

If we search for a hadronic W decay inside a jet we can force the jet algorithm to produce exactly two 
subjets, so a general clustering history is reduced to one y value. For jets originating from a heavy W boson 
we expect this relevant splitting to occur at yp\ ~ rn%/, see Eq.Q. For QCD events with no hard scale 
above 0{rnb) appearing in shower and hadronization, all y values should be much smaller. In Fig. [3] we see 
that a cut 1.6 < \og(pTy/y) < 2.0 extracts the signal well and efficiently removes the IV+jets continuum 
background. Numerically, (pr-y/v) = 10 19 ~ 79 GeV indeed corresponds to the W mass. Correlated to this 
cut on the splitting history, an additional cut on the jet mass of the W candidate of 70... 90 GeV helps reject 
the QCD backgrounds further. The direct generalization of this approach to top tagging at the LHC leads 
to the YSplitter or ATLAS top tagger, which we will discuss in Sec. |IIB| 

The third of the early and ground-breaking jet substructure papers applies a combination of the two 
approaches presented above and constructs the BDRS Higgs tagger [3H]. The motivation for such a Higgs 
tagger is easy to see: for Higgs masses around 120 GeV two thirds of all Higgs bosons decay to a pair of 
bottom quarks. If we want to study and measure properties of the Higgs sector at the LHC, a measurement 
of this branching ratio is mandatory |29j . Without such a measurement, the denominator of any LHC 
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log(WPTWy) 



Figure 3: Left: Normalized pr^/y distributions in the W candidate jet for a massive WW system decaying to 
two relativistic semi-leptonic W bosons (dots), ti events (triangles), and W+jet production (squares). Figure from 
Ref. [26]. 



counting measurement a x BR and with it any link between the measured rate and the Higgs couplings is 
undetermined. 

Inclusive Higgs production as well as Higgs production in weak boson fusion with a hadronic Higgs decay 
will not be observed at the LHC, most likely not even triggered. Associated Higgs production either with a 
V = W, Z boson or with a top quark pair have been studied in detail over many years, but the continuum 
QCD backgrounds Wbb and ttbb might well be impossible to suppress using the standard techniques. 

The BDRS Higgs tagger [28] in its optimized setup is based on the C/A jet algorithm in combination 
with a mass drop criterion in the jet un-clustering. It individually analyzes clusterings where the original 
jet algorithm combines ji and ji into j. The iterative un-clustering and selection goes through the following 
steps: 

1. Un-do the last step of the fat jet clustering where the (parent) subjet j breaks into two (daughter) 
subjets ji_2- For the BDRS Higgs tagger we use the C/A jet algorithm. 

2. Test three conditions: first, the drop in jet mass has to be large for a heavy particle decay; second, the 
splitting should then be symmetric; finally the subjets ji.2 have to be sufficiently hard: 

~, <Q67 l,K1;:/ ^; A/ '-' .... ' .„.(><> PT ,>30GeV, (6) 

rrij rrij ma,xp T j. 

3. For all other splittings, identify the more massive subjet of the j'1.2 with j and remove the less massive 
one. This splitting is then removed from the relevant splitting history of the fat jet. 

4. Go to the next splittings with the parent subjet j = ji and, if applicable, j = j 2 - The un-clustering 
loop will stop once the last condition in Eq.([6]) cannot be met anymore. 

5. Reconstruct the Higgs mass from the jet mass rrij of the relevant splitting(s). To remove effects from 
soft radiation, underlying event, and pile-up the BDRS tagger employs a filtering stage described in 
Sec. |III A| The Higgs mass will then be reconstructed from filtered subjets. 

The three conditions in Eq.(|6| effectively reject typical soft and collinear QCD splitting, even though the 
C/A algorithm only takes into account soft structures. Differences between jet algorithms we will discuss 
for example in Sec. |II C or in Sec. HE 



If the jet j is a candidate for the Higgs boson, the two immediate daughters ji should be bottom jets. 
This means we can eventually apply two 6-tags, one on each of the subjets inside the fat jet. This increases 
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the light-flavor QCD rejection by up to four orders of magnitude. It turns out that 6-tagging of reasonably 
boosted objects with a geometric separation of the order of Rbb ~ 0.4 shows an even better performance than 
6-tagging of continuum jets because the boost aligns the B decay products [3D]- The relatively small cut on 
mj./rrij in Eq.([6]) allows configurations where one of the two bottom decay jets still includes a non-collinear 
gluon radiation, so in some situations a softer threshold of 0.9 might improve the signal-to-background 
ratio [TP] . 

As mentioned above, the results shown in Fig. [2] indicate that the success of any recombination algorithm 
in fat jet analyses crucially depends on the treatment of underlying event and pile-up. The BDRS Higgs 



tagger includes a filtering strategy which we will discuss in detail in Sec. Ill A Inspired by Ref. [TT] it defines 
a finer geometric resolution -Rmtcr ~ Rbb/% an d re-clusters the leading Higgs decay product on this scale. The 
three leading subjets, corresponding to two bottom subjets and one, possibly hard, gluon subjet it identifies 
with the Higgs mass and momentum. If the original fat jet is of size R = 1.2 and the filtering scale is around 
-Rfiiter = 0.3 the effective area included after filtering is only 3 x (0.3/1.2) 2 = 0.2 of the original fat jet. 

One slight complication can arise when we use the BDRS Higgs tagger in a busy jet environment, like 
for ttH searches [TU]. Due to the high jet multiplicity in the event there might be more than one candidate 
for a massive splitting inside the Higgs fat jet. To limit the combinatorial background we cannot simply 
include all possible pairings. Instead, we only keep the (three) leading pairings ordered by the modified 
JADE distance pT,j 1 PT,j 2 (^Rjij2) 4 with a bias towards large geometric separation. For the general cutoff 
on all subjets considered a relatively large value pr > 40 GeV will add to the QCD rejection [TU] . 



It is important to notice that this Higgs tagger does not require a knowledge of the Higgs mass, which 
means that we can show m™ distributions for signal and background and search for a mass peak with 
proper side-bins and all the associated analysis benefits. A detailed comparison between the BDRS results 
and the full ATLAS detector simulation reveals very few potential problems in Higgs tagging. The C/A 
un-clustering combined with the mass drop criterion and the ^-tagging work at least as well as expected. 
Potential problems are only charm-induced mis-tagging, which will not affect the top taggers discussed below, 
and the reconstructed Higgs mass window, which should be of the order of ±10%. 



B. YSplitter 



Early on, ATLAS developed a top tagger based on the structure of the splitting history t/j, as originally 
suggested in Ref. [26]. It is usually referred to as YSplitter of 'ATLAS default tagger' [31]. The physics case 
for this tagger is best illustrated by the signal process in the original publication [3T], namely a heavy Z' 
boson decaying to a semi-leptonic top pair. As usual, the lepton guarantees that the events are efficiently 
triggered and reduces the jet combinatorics. As suggested in Fig. [I] YSplitter requires one hard fat jet with 
Pt > 300 GeV. However, varying the transverse momentum range the cuts proposed in Ref. |31j give a broad 
plateau in the top tagging efficiency for px y t = 700... 1400 GeV. 




Figure 4: Three splitting points yi for fat jets from Z' decays with mz< — 2 TeV (solid) and 3 TeV (dashed). The 
rr-axis shows the three leading splitting points of the fcT-algorithm as pr^/y/2. Figures from Ref. [3T| . 
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The relevant parameter to identify heavy particles in the kx splitting history of the fat jet are for example 
three y values, as defined in Eq.([5|, corresponding to the decay jets and one QCD jet or the b jet from 
the leptonically decaying top. In Fig. [4] we show the leading three splitting points for the signal sample. 
For the first two splittings which correspond to the t —> Wb and the W — > jj decay steps the mass of the 
heavy resonance has a negligible effect. From the original W analysis we know that pTy/y scales well with 
the mass of the decaying massive particle. This mass variable pT^/y/2 is shown on the x-axis of Fig. [4] and 
indeed reproduces m t /2 and m-w/2 for the top decay steps. The problem with using these one-dimensional 
distributions for a tagging algorithm is that they are very broad. For example the top mass reconstruction 
only works to roughly mt/2 ~ (90 ±40) GeV. In addition, the correlation between the three extracted values 
for PTy/y is minor. 

For the fourth splitting which arises because of the additional b jet from the leptonic top decay and 
continuum QCD jet radiation we find typical values pr^fy ~ 30 GeV. This is much harder than expected 
from QCD. In part this is due to the very heavy particle in the s-channel and the associated very hard 
collinear radiation scale. This means this value increases even more for heavier Z' scenarios. While such an 
effect of a generic hard process is understood from appropriate QCD simulations [35], it does not probe the 
top content of the signal and is likely to be mimicked by the backgrounds once all other cuts have moved 
them into a signal-like phase space region. Generally, the last splitting is in the range where jet substructure 
algorithms typically have a soft cutoff on all subjets considered. 

The original W tagger [55] applies a second cut on the W jet mass without any study of the correlation 
between the jet mass and the splitting history cuts. For well isolated jets the reconstructed top jet mass 
is much more narrow than the PTy/y distributions, typically reproducing the input top mass to 0(10%). 
Implementing cuts on two-dimensional correlations of pr^/y, mj, and pxj improves the signal efficiency 
as well as the QCD background rejection. Therefore, in the most recent versions of the ATLAS default 
tagger a neural net is employed for all measured parameters from the kx splitting history. It is worth noting 
that a very important initial step in experimentally establishing subjet techniques has been achieved with 
this tagger: in an ATLAS study it was shown that for signals with a heavy s-channel resonance the Z' 
mass reconstruction from boosted top quarks is better than from the known reconstruction techniques of 
semi-leptonic top quarks. 

C. Seattle Tagger 

A second tagging algorithm based on the fcx-algorithm follows a different approach. Instead of studying 
the splitting history the way YSplitter (55} [3T] does it, it aims at removing all soft and collinear splittings as 
expected from pure massless QCD |33j and then studying the remaining massive splitting candidates [25] . 

In the £>p- algorithm, two cases of preferred jet merging appear. On the one hand, if one subjet is much 
softer than others, it should be merged into one of the harder subjets. This can be identified through a 
min(pT,ji iPT,ja)/PT,j measurement. On the other hand, collinear splitting in QCD does not require one of 
the jets to be soft, so also two jets with small ARj 1 j 2 should be merged. Any splitting in the fey-history 
which does not satisfy these two conditions is a candidate for a non-QCD splitting. The details of this 
'pruning' procedure we will discuss in Sec. |III C| Originally, it was meant to improve tagging algorithms 
based on jet mass measurements both using the fey or the C/A jet algorithm [33] . In a second step, it can 
be used to tag heavy particles inside of fat jets [55] . 

The Seattle or pruning top tagger includes three parameters: the jet mass rrij, the transverse momentum 
drop min ptjJptj as defined in Eq.Q, and the angular spread of the daughter subjets ARj t j 2 . The details 
of the pruning algorithm we discuss in Sec. |III C| A comprehensive study first shows that the jet mass from 
the C/A algorithm is more stable than from the fey-algorithm. The reason is that according to Eq.Q we 
can write the (parent) jet mass in terms of the distance measure of the fey-algorithm 

^ ~ 1—2 

= (pr,j z ARj 1 j 2 ) = dj 1 j 2 (k T algorithm) . (7) 

This formula shows that soft or small- z splitting will contribute disproportionally to the jet mass, leaving it 
vulnerable to many problems with soft radiation detection and identification. 
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Looking at the second observable, we find that the pr drop for the C/A algorithm prefers small values 
of minpTji/pT.j < 0.04 even for boosted top decay jets. Unless we do something about it, this makes 
top tagging in this regime difficult. Finally, the R separation of the last splitting in the case of the C/A 
algorithm has to correspond to ARbw > ^Rjj because of the geometric ordering of the recombination. For 
the fc-r-algorithm the R separation of the last splitting can be somewhat smaller. These basic differences can 
be blurred for example by pruning or similar procedures discussed in Sec. |III| but they nevertheless explain 
why the BDRS Higgs tagger, based on the C/A jet algorithm, requires a drop in the jet mass and not in the 
transverse momentum of the splittings [25]. In Sec. II E we will discuss the application of the C/A algorithm 
in combination with a mass drop criterion for top tagging. 



Following a detailed analysis the critical observables for the Seattle tagger are chosen to be jet masses, i.e. 
the reconstructed top and W jet masses. A fat jet for example of size R — 1.0 is un-clustered iteratively, like 
described for the BDRS Higgs tagger in Sec. |II A| An additional pruning step ensures that only hard and 
well separated splittings survive. For these candidate splittings the jet masses are compared to their known 
values. Because of the problems with the jet mass reconstruction described above, the allowed deviations 
for a top tag are of the order of mt ± 14 GeV and mw ± 12 GeV for the /c^-algorithm while for the C/A 
algorithm they can be smaller, mj ±11 GeV and m\y ±8 GeV. Nevertheless, due to the other issues with the 
C/A-algorithms Ref. [25] finds that a pruning-based top tagger shows a better performance when combined 
with the kr algorithm, in particular towards larger pxj values. 

When comparing different top taggers it should be noted that the pruning or Seattle tagger can be viewed 
as a more general analysis tool for fat jets with suspected massive splittings. In addition, the pruning stage 
can be used in combination with any specialized Higgs [34] and top tagger [35] . 



D. Johns Hopkins Tagger 



The Johns Hopkins top tagger [35] was the first public top tagger applying the successful BDRS setup to 
the two-step top decay. Because the additional kinematic conditions on now three decay jets should allow 
for a more effective QCD and IV+jets background rejection, an additional 6-tag is not foreseen. However, it 
can be added to it, as we discuss towards the end of Sec. |II E| 

In its original setup the tagger is optimized for relatively highly boosted top quarks, so for fat jet transverse 
energies Etj > 1 TeV is starts with a C/A jet of size R = 0.8. Translated into transverse momentum a second 
basic requirement is prj > 0.35 Etj- From Fig. fllwe see that boosted top quarks with px,t > 350 GeV 
indeed require at least R = 0.8 to be fully includedinside the fat jet. Following the BDRS recipe described 
in Sec. [HA] the fat jet is then iteratively de-clustered. A splitting is kept as the candidate for one of the two 
top decay steps if 

> 0.1 AR hh > 0.19 . (8) 

A slight change with respect to the BDRS algorithm is that p^ ard is the same for all splittings. It is fixed 
by the parent px.j in the first relevant splitting and stays at this value for the remaining un-clustering 
sequence. This way, the first condition in Eq.Q also terminates the un-clustering loop and no additional 
soft cutoff is required. Once two successive clusterings corresponding to the top and W decays are identified, 
the algorithm also terminates. 

The numerical values for the tunable parameters shown in Eq. ^ we quote for the softest fat jets considered 
in Ref. [3HJ, i.e. px,t > 350 GeV. Comparing this condition to Eq.Q we see that the mass drop criterion in 
the BDRS algorithm is replaced by a pr drop, as for example discussed in Sec. |II C| 

For signal events the iterative algorithm should stop after identifying three or four decay subjets, allowing 
for one final state gluon in the top decay. At this stage, three conditions have to be met by one combination 
of the jet momenta 

rrijjj — m t ± 30 GeV rrijj — m w ± 15 GeV cos 9h < 0.7 . (9) 
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Figure 5: Left: Helicity angle distribution for top, gluon, and light-flavor quark jets fulfilling pr > 700 GeV. All 
other tagging cuts have been already imposed. Right: original tagging and mis-tagging efficiency estimates for top 
and QCD jets as a function of the fat jet pr- Figures from Ref. |36| . 



The angle Oh is the top helicity angle, measured in the rest frame of the reconstructed W boson. It is defined 
as the opening angle of the incoming top momentum and the softer of the two W decay subjets. In lcptonic 
top decays the corresponding W decay product is usually chosen as the lepton momentum, but for hadronic 
W decays the two jets are indistinguishable. The helicity angle distributions for signal and backgrounds we 
show in Fig. [5j For top jets it is essentially flat, while for massless QCD splittings it is strongly peaked 
towards small angles. This corresponds to the collinear divergence in the splitting kernels which runs into a 
1/(1 — cos Oh) ~ pole. In other words, the top helicity angle does not act to reject top decays with wrong 
angular correlations; it is simply another independent observable of the three top decay jets in addition to 
the two mass constraints. 

Experimental QCD effects like underlying event or pile-up will eventually require some kind of filtering 
stage in the algorithm, as we will describe in Sec. |III| and as already included in the HEPTopTagger discussed 
below. This modification is straightforward. In addition, the identification of the top helicity angle as one 
of the kinematic observables means that the tagging algorithm identifies the 6-jet from the top decay. This 
would allow us to immediately add a fr-tag to the Johns Hopkins tagger |35j , if we are interested in additional 
background rejection. 

In the right panel of Fig. [5] we show the original estimates for the tagging efficiencies. While the final 
numbers will most likely change significantly after including pile-up, some kind of filtering, and detector 
effects, we do observe a clear structure in the signal efficiency: for low transverse momenta the tagging 
efficiency decreases mostly due to a limited number of top jets fully included in the original R — 0.8 fat 
jet. To some degree this can be improved by allowing larger R values as one of the tuning parameters of 
the tagger 35J . The drop towards large transverse momenta has a two-fold reason; for strongly boosted top 
quarks eventually the calorimeter resolution will limit the tagging performance. Even before this, once we 
arrive at mw <C PT,t some of the fat jet splitting history will be dominated by soft kinematics. In that case 
the C/A re-clustering will fail to reconstruct the correct top decay products [35]. In this case we should 
switch to the fc^-algorithm for the top reconstruction stage. 

The CMS tagger [37] is essentially a Johns Hopkins tagger with very few modifications. The algorithm is 
exactly the same, but the three kinematic conditions defined in Eq.([8]) are replaced by 

m jjjU) = 100.. .250 GeV min nijj > 50 GeV . (10) 

The second condition means that we first identify the three hardest subjets inside the fat jet, then combine 
them into six two-jet pairs, and finally require that all of those have a sufficiently large jet mass. The CMS 
study tests that indeed the C/A jet algorithm leads to the best results. An obvious and critical check of the 
CMS tagger would be an observation of tops in Standard Model top pair events with a top mass peak in the 
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E. HEPTopTagger 



Similar to the Johns Hopkins tagger the HEPTopTagger [35] (Heidelberg-Eugene-Paris) makes use of 
the BDRS setup and generalizes it to the multi-step top decay structure. It is originally used to study tiH 
searches with two fat jets in a high-multiplicity environment, one from the Higgs and the other one from 
the top [TU]. Its public version is in detail described in the Appendix of Ref. [8]. Additional improvements, 
discussed in Ref. [35], we will briefly discuss at the end of this section. The reference analyses for the 
HEPTopTagger are associated top-Higgs production [5J and supersymmetric scalar top pairs decaying to top 
quarks and missing energy [35] . Hence, this tagger needs to aim at considerably lower px,t values. This 
makes it a promising tool to establish subjet techniques and top tagging on Standard Model top samples at 
the 7 TeV LHC. 

From Fig.[T]we can see that starting a top tagger with C/A jets of size R = 1.5 should allow us to access top 
quarks down to pr,t ~ 200 GeV. For example looking at Standard Model top pairs an increase from R = 0.9 
to R = 1.5 means that twice as many top quarks are accessible even for relatively large pT,t ~ 400 GeV. On 
the negative side, this increase in the jet area poses two problems for the tagging algorithm. First, subjet 
combinatorics will increase and it will get harder to identify the individual top decay products. Second, 
pile-up will become a huge problem, so the HEPTopTagger always includes a filtering stage as described in 

Sec. una 

The tagging algorithm proceeds in steps similar to the BDRS Higgs tagger with its mass drop criterion 
Eq.jjj): 

1. Un-doing the last clustering of the jet j the mass drop criterion minm^ < 0.8 mj determines if we 
keep ji and j%. Subjets with uij i < 30 GeV are not considered, which eventually ends the iterative 
un-clustering. A symmetry requirement, as in the BDRS algorithm, is not included. 

2. Apply a filtering stage to construct one three-subjet combination with a jet mass within m t ± 25 GeV. 

3. Order these three subjets by p^. If their jet masses (t77, 12 , ni 13 , 77123) satisfy one of the following three 
criteria, accept them as a top candidate: 

0.2 < arctan — < 1.3 



. ™23 j3 
< < -fin 




TO23 

and > i? so ft 



and — ^- > i? soft (11) 



4. For consistency, require the combined pr of the three subjets to be above 200 GeV. 

The dimensionless mass windows i? m i n = 85% x mw/^t an d i? m ax = 115% x mw /wit are tunable and will 
be optimized by the experimental collaborations. The soft cutoff i? S oft = 0.35 removes QCD events which 
the C/A algorithm cannot correctly identify as soft radiation. 

Compared to the Johns Hopkins or CMS taggers there are four main differences. First, the HEPTopTagger 
determines the entire clustering history and does not enforce exactly two hard splittings. Second, it is 
based on a mass drop criterion instead of a transverse momentum drop. As a matter of fact, the entire 
HEPTopTagger algorithm is only based on jet masses. The C/A algorithm should reconstruct those very 
well after filtering. Third, in the left panel of Fig. [6] we see that it is quite likely that two invariant mass 
combinations of the three top decay products will be in the mw range. This is due to the specific top 
kinematics with its endpoint maxmL. < rnj — m^, which is numerically close to [38) . Consequently, 



l bji ^ "H ~ " l Wi 

we avoid assigning two subjets to the W decay and instead impose a symmetric condition, like the one given 



in Eq.(ll). Finally, similar to the BDRS approach the HEPTopTagger uses filtered subjets to reconstruct 
the top decay products. This has significant impact on the performance in particular on data, as discussed 
in Sec Hm 
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Figure 6: Events in the arctanmi3/mi2 vs m23/mi23 plane for tt (left). VF+jets (center) and pure QCD jets (right) 
samples. More densely populated regions of the phase space appear in red. Figures from Ref. [8j. 



No matter if the new physics signature is a massive s-channel resonance or a decaying supersymmetric 
top squark we need to rely on the top tagger to not only identify top quarks but also reconstruct their 
4-momenta. For the HEPTopTagger the quality of this momentum reconstruction has been studied in detail. 
The question if the top tagger really reconstructs all top decay products is surprisingly irrelevant for this test 
— a generic tagger will always be fairly likely to correctly assign the hardest two top decay products, while 
the softer W decay subjet will contribute little to the reconstructed top momentum. This is the reason why 
even for moderately boosted tops 95% of the tagged events show a correctly reconstructed direction within 
AR = 0.5; for more than 80% of the tops the momentum is reconstructed within 20% of the Monte Carlo 
truth. For stop pairs decaying to fully reconstructed hadronic tops and missing energy this means we can 
apply the usual mjj methods to measure the supersymmetric masses in the process. 



In a subsequent study, possible improvements to the HEPTopTagger are tested [35] . Similar results 
should apply to other top taggers, like the Johns Hopkins tagger. As a first improvement, we can use the kr- 
algorithm in the filtering and the re-clustering stages. For the re-clustering this stabilizes the HEPTopTagger 
tagging efficiency on a plateau between px,t = 200 GeV to 600 GeV, where the C/A version shows a significant 
drop in performance. Second, as discussed in Sec. Ill D we can add a pruning stage parallel to the filtering and 
then include both reconstructed top masses in the selection. Compared to the critical pure QCD background 
this can improve the signal-to-background ratio by a factor of two. In contrast, increasing the size of the fat 
jet to R = 1.8 has little benefit even for Standard Model top pairs because subjet combinatorics compensate 
the possible benefits. 2 Finally, simply adding a 6-tag at the end of the top tagging algorithm can improve 
the background rejection. Including the ^-tagging information inside the HEPTopTagger algorithm does not 
appear promising. 



A combination of the HEPTopTagger and a (sub-)jet count, as we will describe in Sec. II G is presented 
in Ref. [39]. It first defines an anti-fcr fat jet of size R — 1.5, reclusters it as a C/A jet with a mass drop 
criterion, applies the usual top and W mass constraints and then adds a 6-tag. As an additional criterion it 
uses an R — 0.6 anti-fcy jet algorithm on the same event and compares the number of jets per event. Only 
events with two fat jets of size R = 1.5 and at least three smaller jets of size R = 0.6 are kept as top pair 
candidates. The reasoning behind this is that QCD dijet events with two hard jets will show two hard jet 
structures independent of the jet size while high-multiplicity top jets will show an increased number of jets. 



It does increase the absolute number of signal events with constant S/B, so it might be useful in the first stage of an 
experimental test 1041 . 
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Figure 7: AR between the reconstructed and the parton-level top quark in ti events (left), Apt/Pt c for the same 
sample (center) and Ajp|/|p| rcc (right). For the solid curves we only apply the default cut Px% > 200 GeV while the 
dashed curves require 1 > 300 GeV. Figures from Ref. [Hj. 



F. Thaler- Wang Tagger 



Around the same time when the Johns Hopkins tagger adopted the BDRS approach for boosted top quarks, 
the Thaler- Wang tagger took a different approach. It describes a subjet splitting in terms of the jet mass rrij 
of the parent subjet and the subjet energy drop in the splitting. The definition of the jet energy drop is not 
unique, so it can be implemented into a tagging algorithm in different ways, all equivalent in the massless 
and collinear limit 



min E jz d jlh min(pj. • p ref ) 

,2 



Ej d jlj2 + m 2 - fa ■ 



(12) 



where dj ± j 2 — min(py - 4 ) Ai?| ij2 is the distance measure in the fc^-algorithm introduced in Eq.([2| and p Te { 
is a free reference 4-vector, for example the direction of one of the incoming protons. This energy drop is 
expected to only be weakly correlated with the jet mass, which simplifies the tagging algorithm. 



Because all definitions in Eq.( 12 1 coincide in the collinear limit it is unlikely that we will be able to compare 
their performance inside a tagger on Monte-Carlo data, which is generated with a parton shower. For top 
jets simple simulations show that the second two definitions are essentially equivalent while the actual energy 
ratio has a significantly softer z spectrum. For QCD jets all definitions are equally strongly peaked towards 
small z values, but the energy drop has much smaller tails for z > 0.3 [10] , 

To extract massive splittings the Thaler- Wang tagger starts with an anti-fey jet of size R = 1. Of this 
fat jet only the regions with the hardest jets are labelled and re-clustered with a fc^-algorithm. To apply 
the tagger to LHC data in the presence of underlying event and pile-up it needs to be supplemented with a 
trimming stage to remove soft calorimeter activity, as described in Sec. |III B| 

The numerical values in the tagging criteria are optimized for highly boosted top quarks with px,t > 800 
at least, where we require jet mass windows and large energy ratios for example using the first definition in 



Eq.(12): 



160... 200 GeV mjj = 60... 100 GeV , z > 0.1 (13) 

where the W mass constraint has to be fulfilled by one subjet combination and the z value is extracted from 
the t — > Wb decay step. Both observables are shown in Fig. [8] and show a clear difference for signal and 
backgrounds. 

To this stage the Thaler- Wang tagger does not yet include a distinctive feature of a three-body decay. 
Therefore, it is combined with the classical sphericity event shape. The sphericity tensor [41] defined on the 
two-dimensional plane transverse to the boost direction is defined in terms of calorimeter objects 



S Xkl = V Pa Pa (14) 

V \v x \ ^ Id-H ' K ' 
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Figure 8: Reconstructed kinematics variables for top jets and QCD jets in the top mass window of Eq.|13j) and after 
an additional cut of pr > 1200 GeV on the fat jet: original energy ratio z (left), rrijj for the W candidate subjets 
(center), and the determinant of the transverse sphericity (right). Figures from Ref. |40| . 



It is constructed out of the transverse momentum components of all energy depositions a, perpendicular 
to the jet momentum. To avoid constructing the tensor with explicit coordinates the actual observable is 
its determinant. For two-body kinematics det S ±kl is zero. For three-body decays it corresponds to two 
finite tensor eigenvalues summing to unity and hence ranges within 0...0.25. In the right panel of Fig. [8] we 
see that top jets show a clear bias towards large det S values, but that the background uncertainties are 
significant. In addition, event shapes will even more than jet masses be affected by pile-up and the way we 
remove it [14] . 



G. N-Subjettiness 

iV-Jettiness [42] is an event shape which describes the number of isolated jets in an event. It can be 
adapted as 7V-subjettiness to count subjets inside a fat jet [35144"5] . Relative to N subjet directions hj it is 
defined as 

tn = — a VI PT,a min (Ai? fc . Q ) /3 (15) 

V Bt k=l,....N 

with an arbitrary weighting exponent j3 > 0, to ensure infrared safety. The normalization factor limits tjv 
to the interval 0...1. In the first version of the tagging algorithm [44] these N axes are defined through a 
subjet algorithm. In a modified version 45j they are defined in analogy to the thrust event shape, namely 
as a minimization of the numerical value for rjy- 

Fat jets with large values tn — > 1 have many calorimeter clusters far away from the N main axes, which 
means they consist of at least N + 1 well separated subjets. In the ratio tn + i/tn typical QCD effects will 
drop out, and the ratio will develop a dip for events which have N + 1 subjets. 

Because of the largely unknown QCD effects the value of T3 , i.e. the quality of the three-subjet hypothesis is 
not the best discriminator of top jets as compared to QCD jets. The rjy distributions using the minimization 
criterion and j3 = 1 tend to peak in the t\ = 0.2. ..0.25, r 2 — 0.07. ..0.1, or T3 = 0.04. ..0.05 regimes, where 
the lower values are given for QCD jets and the upper values are reached by top jets [IS]. However, the 
QCD-induced widths of the distributions are consistently larger than the peak differences. 

In Fig. [9] we instead show two ratios of TV-sub jettiness values. The two constructions of the N reference 
directions give very similar results, with a little bias towards smaller ratios for the explicit minimization 
condition. For top decays producing three separated subjets the ratio t 3 /t 2 is expected to drop, compared 
to the QCD case. Indeed, we see a significantly lower signal peak than background peak in t 3 /t 2 , even 
though this is at least as much due to an increase of the background peak as a decrease of the signal peak 
compared to t<z/t\. 
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Figure 9: iV-subjettiness distributions for signal an background. Both methods for extracting the N reference 
momenta are shown. Both panels use /3 = 1 in the definition of r, which turns out to give the best tagging 
performance. Figures from Ref. [45] . 



The associated top tagging algorithm is based on an anti-fey jet of size R = 1.0 and with px > 200 GeV. 
The choice of jet algorithm reflects the fact that the clustering history will not be part of the top selection 
criteria. Instead, it uses two basic jet shape requirements on the top jet mass and on the ratio of subjettiness 
values 

mfat jet = 160.. .240 GeV ^ < 0.6 , (16) 

Because the fat jet mass is not corrected for soft QCD and pile-up its upper limit is larger than usual. The 
efficiencies obtained for different methods of reconstructing the N reference directions and for (3 = 1...2.5 
only slightly differ, likely within the uncertainties induced by QCD and detector effects. 



An obvious extension of the tagging criteria Eq.(16l would be including all tjv and tm /T2v-i measures for 



N = 1, 2, 3 and [3 = 1,2. For fixed efficiencies this reduces the mis-tag rate by roughly 20% 



H. Alternative jet shapes 



After discussing a set of specialized top tagger which are currently being tested by ATLAS and CMS we 
have to add a few more general approaches. For example, the template method based on jet shapes or the 
pure counting based tree-less approach are likely not going to be the leading top tagging tools used at the 
LHC. However, their ideas might well prove useful when the experimental task at the LHC goes beyond 
identifying known Standard Model particles and features. 

The template method for top tagging [46] relies on anti-fcr-jets of size R = 0.5 and a jet energy in the 
1 TeV range. In a similar ansatz [17] this is replaced by a cut on the transverse momentum of the leading 
jet of at least pr,j > 1 TeV. In addition, the fat jet mass has to lie in the 160... 190 GeV range. Relevant 
additional observables are then included as an overlap of measured correlations on the calorimeter level and 
different parton-lcvel templates, weighted by the geometric energy deposition. 

Possible additional observables used in this top tagging study are jet shapes. Event shapes like thrust or 



the eigenvalues of the sphericity tensor Eq.(14) can be used on the content of geometrically large jets and 
their constituents. In that framework they are often referred to as jet shapes. A jet shape which is essential 
for all top tagging algorithms is the jet mass. A major theoretical issue is if jet shapes are infrared safe, which 
we will skip in this discussion |24] , Obviously, this question also includes the underlying jet algorithms. 

An essentially equivalent alternative to the sphericity is the planar flow. It is derived from the tensor /„, 
and its two eigenvalues A1.2 [48 

T ki = JL V P° k ^ 1 P = 4detI ™ = 4A i A 2 h7) 

w m jct E a (tr/ U) ) 2 (Ai+A 2 )2 ' { '* 
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For only two constituents P again vanishes, as it does for any kind of linear geometry. For a generic three 
body decay it can assume any value between zero and one. For example requiring P > 0.5 enhances the 
number of top jets over the QCD background. In practice, the template tagger uses a correlated cut in 
the template overlap vs planar flow plane. Given that the overlap measure includes the full kinematic event 
information it might be possible to further improve it in the direction of the so-called matrix element method 
of log-likelihood ratios. 

Yet another class of jet shapes which we can use to describe two-body as well as three-body configurations 
are angularities 49, 50]. In the template method they are only included for Higgs tagging, but they can also 
be used to improve top tagging. For different weights a the angularity is defined as 

in terms of the angle 9i with respect to the main axis. The correction factor n/(2R) includes the jet size R 
and ensures that for the maximum value 0i = R the argument of the trigonometric functions does not exceed 
the hemisphere limit ir/2 from earlier e + e _ applications. Infrared safety limits the range of angularities 
to — oo < a < 2. For a = we find that 1 — To turns into thrust [23], while for a = 1 is becomes jet 
broadening [51j . Because for each value a the angularity is a simple number we can correlate it with other 
observables, like for example the azimuthal angle between the W decay subjets and search for structures in 
such distributions. 

A second alternative approach to top tagging, explicitly not based on the clustering history, is the tree-less 
substructure analysis |52| . Unlike for example the iV-subjettmess it includes angular correlations. From the 
JADE distance measure we know that angular separation can be closely linked to invariant masses of 

subjet combinations. 

The geometric correlations between all possible pairs of subjets can be analyzed in terms of the angular 
structure function and its numerical derivative 



4u 2 DE) W AR ^) \pi m _ u 4u 2 DE) W ATI 



G(R) = ^ f JADE1 A ^( R )= R (jade, ' " — ■ (19) 

.(JADE) ,( J ADE) o / d _ A K> . ,\ 

The function K is nothing but a finite delta distribution, e.g. K(x) = e~ x2 / R ° j \Jt\R\ with Rq — 0.6. It 
fixes a typical R distance between two subjets. For values R — i?* corresponding to observed subjet pairs 
inside the fat jet the function G(R) makes a step and AQ(R) develops a peak. Top decays with three hard 
decay subjets will show three such peak values Rk* with k = 1,2, 3, each corresponding to one side of the 
triangle defined by three subjets. The number of observed peaks we call n p . For each of the peaks we define 
a mass value 

m * = i; E 4^ E) ^K(R^AR nn ) (20) 

where the JADE distance is defined in Eq. ^ . For massive particle decays this mass variable scales with 
the invariant mass of the parent subjet. 

In Fig. [lO] we show the peak positions and their associated mass values for three-subjet signal and back- 
ground configurations. For QCD backgrounds the i?* distributions are broad and essentially scale invariant. 
The distributions points towards small values, even though their typical values increase typically by a 
factor two for increasing points. In contrast, for top jets the i?* distributions are peaked. Their mass scales 



correspond to the given decay kinematics, as for example discussed in Sec. II E 



The associated tree-less top tagging algorithm starts with a fat C/A jet of size R — 1.5. From the peaks 
in the AQ spectrum we then extract one, two or three hard subjets. There exist different sets of cuts, 
depending on the transverse momentum of the fat jet and the number of peaks. We quote the cuts applied 
to events with three subjet structures and pr = 300... 400 GeV. The original uncorrected fat jet mass and 
two peak-associated mass values to* have to fulfill 

TOf at j Ct > 102 GeV m 2 * > 26 GeV m 2 * > 79 GeV , (21) 
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Figure 10: Peak positions and associated masses for n p — 3 and fat jets with pr = 500. ..600 GeV. Shown are 
normalized distributions for the top signal (blue) and QCD backgrounds (red). Figure from Ref. |52j . 

In addition, the angular correlations have to satisfy 

R lt < 0.81 i? 2 * < 1-03 i? 3 * < 2.11 . (22) 



While the two taggers discussed above might not give the best efficiency for the usual signatures, they 
have the advantage of being much more general than some of the established taggers. If jet shapes should 
indeed turn out powerful QCD analysis tools at the LHC, these approaches will allow us to efficiently utilize 
jet shapes in searches for new physics. 

QCD observables which are not linked to traditional event shapes might also help distinguishing massive 
electroweak splittings from QCD backgrounds. The radiation of QCD jets possesses characteristic features 
which we can use to discriminate a color octet gluon from a decaying color singlet resonance. Angular ordering 
of soft gluon radiation implies that most gluons are emitted in between color connected partners [T71 [53] . In 
the decay of a color singlet, e.g. H — > bb, the two decay products are always color connected. In leading 
color approximation this is not true for a gluon which splits to bb. Its gluon radiation is therefore more likely 
to be outside the bb cone. 

Two observables might exploit this feature in the top tagging framework. The pull vector [M] can be 
defined for each individual jet in an event 

t= V ^ \r a \ r a . (23) 

Here, f a is the constituent position relative to the jet andpT,a is the transverse momentum of this constituent. 
The angle between the pull vectors of different jets can be used to decide if two 6-tagged jets come from a 
color singlet resonance or a color octet gluon. Pull has been tested on W bosons from top decays by DO [55] , 
According to this measurement the fraction of uncolored W bosons is 0.56 ±0.42 (stat+syst), indicating that 
pull is a challenging observable already in the relatively clean Tevatron environment. 

As a second observable dipolarity [SS] can help selecting the correct W decay products in a boosted top 
decay. Compared to the pull angle, its definition is modified such that all radiation off the dipole is captured 
in one (sub-)jet. For a jet splitting into two subjets j± and j% dipolarity is defined on all calorimeter objects 
a as 

R hh ifet PT 'i ct 

where R a is the distance between the ith constituent and the line segment that runs from j\ to j% . Using the 
HEPTopTagger framework is was shown that dipolarity might be able to reduce the mistag rate significantly. 
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III. QCD EFFECTS 



Hadronic final states of hard interactions resulting form proton-bunch crossings at the LHC are subject to 
many sources of QCD radiation. Final state radiation are soft and collinear jets radiated off the produced 
particles, in our case the top quark. It can be described well using the parton shower, and radiation off 
heavy states is suppressed. Initial state radiation are soft and collinear jets from initial state radiation, 
arising because the incoming partons have to bridge the gap in scale between the proton and the hard 
process. In the collinear limit they are also well described by the parton shower, in the harder regime they 
require matrix element corrections [17] . 

Underlying event is additional soft QCD activity arising from a given proton-proton interaction and sur- 
rounding the hard event. It is caused by semi- or non-perturbative interactions between the proton remnants. 
The soft continuous underlying event radiation can have a large effect on the jet mass and critically depends 
on the size R of the fat jet [57] 

(6m*) ~ A UE p Ttj + _g_ + (R^ . (25) 

At the LHC, the amount of transverse momentum of the underlying event radiation per unit rapidity, Aue, 
is roughly O(10) GeV [SB]. 

Finally, pile-up is the effect of multiple proton-proton collisions in one beam crossing. Its effects are already 
observed now and arc expected to become even harder to deal with once the LHC runs at design energy and 
design luminosity. Pile-up can add up to 100 GeV of soft radiation per unit rapidity |59j . 

As discussed in Sec. H]the kx and C/A algorithms, for a virtuality and an angular ordered shower, aim to 
reverse the shower evolution. Approximately, they preserve the physical picture of the jet evolution from the 
hard scale to the hadronization scale in the recombination sequence. Initial state radiation, underlying event 
and pile-up spoil this picture and add noise to the jet clustering. Jet-mass-based algorithms using subjets 
as part of the reverse-engineered cluster history are sensitive to a distortion by uncorrelated soft radiation. 

An additional complication in identifying events with hadronically decaying electroweak resonances is that 
splittings of quarks and gluons can geometrically induce a large jet mass, 

(m^Qa.p^Ai^, (26) 

where Cj = 3 (4/3) are the color factors for gluon (quark) induced jets [50]. For very hard jets this value 
can become of the order of the electroweak scale. This makes initial state radiation associated with heavy 
particle production dangerous, in particular in events with generically large jet multiplicity. For the top 
tagger it also means that while ptj and R are required to be large to capture all decay products, they should 
not become too large. 

To discriminate a hadronically decaying heavy resonance from a QCD jet, e.g. using its invariant mass, 
all final state radiation has to be properly recombined. This implies that we can separate it from initial state 
radiation, underlying event and pile-up. While underlying event and pile-up tend to be soft compared to the 
decay products of a boosted resonance, initial state radiation is not [32) . Its typical transverse momentum 
can be of the same order as a W decay jet, in particular for moderately boosted top quarks. Therefore, 
different substructure approaches are needed to cope with underlying event/pile- up and with initial state 
radiation. 



Jet grooming methods, like filtering (Sec. Ill A I, trimming ( III B I and pruning (Sec. IIIC), remove soft 
uncorrelated radiation from a fat jet while retaining final state radiation off the resonance. For QCD jets 
grooming methods reduce the upper end of the jet mass distribution, whereas for signal events they yield 
a sharper peak near the true resonance mass mj = m res . To keep these methods generic it is implicitly 
assumed that for boosted heavy particles pt,fsr > Pt,(isr,ue,pu)- Thus, the transverse momentum of the 
subjets is an important criterion to discriminate between final state radiation and other radiation. Using 
soft-collinear effective theory it has recently been shown that under certain conditions grooming techniques 
factorize [6"T] . 

As a matter of fact, the problem of QCD effects inside geometrically large jets was early on noticed by 
the authors of Ref. [H_2]. This is why their 'top tagger' is based on narrow kr jets for the top decay products 
which are then combined in the spirit of the C/A-algorithm. 
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A. Filtering 

Filtering, the first proposed jet grooming method [28], was introduced as part of the BDRS Higgs tagger. 
Its target application is HW and HZ production with a leptonic decay of the gauge bosons, i.e. events with 
relatively low jet multiplicity. For the fat Higgs jets it applies the mass drop algorithm described in Sec. |II A| 
to extract the relevant bb subjets and their geometric structure. The size of the relevant two subjets is large 
enough to contain most of the QCD radiation from the Higgs decay i.e. soft-collinear enhancement forces 
gluon radiation off the &-quarks to be almost entirely emitted into the two subjets. 

To further improve the resolution the constituents of the two 6-tagged subjets can be recombined into 
smaller C/A-subjets of size 

iW = min (0.3, ^\ . (27) 
This zooming-in obviously reduces the effective area of the fat jet considered for mass reconstruction and 



this way tames any QCD effects scaling with R e.g. as shown for the underlying event in Eq.(25). 

For the Higgs boson the best mass resolution is achieved by reconstructing the Higgs mass from the 
rifiiter = 3 hardest filtered subjets. This means we include two 6-jets and the hardest wide-angle gluon 
radiation. Two free parameters, -Rfiitor & n d ^filter control the filtering performance. 

The effect of the different steps of the BDRS algorithm, including filtering, we show in Fig. [II] The original 
object is a C/A fat jet of size R = 1.2, selected as the hardest jet in the H-^Zg signal and background events. 
To ensure that the Higgs decay products are contained inside the fat jet without including many events 
where the b quarks cannot be resolved, the analysis requires pt,z = 200. ..250 GeV. We see that for the signal 
the mass drop has no big effect on the fat jet mass. Essentially all jets pass the mass drop criterion and only 
little soft radiation is removed. The additional filtering clearly sharpens the peak of the reconstructed Higgs 
mass and moved it towards the input Higgs mass m# = 115 GeV. 

For the QCD background most events do not pass the mass drop condition, which can be seen in the 
normalization of the upper two curves in the right panel of Fig. |11| The corresponding jet mass distribution 



after the mass drop becomes less steep, indicating that following Eq.( 26 ) more energetic jets are more likely to 
pass the mass drop criterion. The additional filtering only has a mild effect on the continuum reconstructed 
mass. This feature is very helpful because it means that filtering does not sculpt the backgrounds, in spite 



of the carefully chosen parameters in Eq.(27|. It allows us to analyze hadronic Higgs decays including side 
bands to determine the backgrounds. 



200 < p, z < 250 GeV 200 < p tz < 250 GeV 200 < p tz < 250 GeV 200 < p tz < 250 GeV 
0.15 i i 0.15 i i 0.15 0.008 
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Figure 11: Mass of the hardest jet in the pp — ¥ H^Zg final state before mass drop, after mass drop, and after filtering 
(from the left). The very right panel shows all three curves (top to bottom) for the Zbb continuum background. In 
the BDRS analysis the C/A fat jet has size R = 1.2 and the leptonic Z is required to have pr,z = 200. ..250 GeV. 
Figure from Ref . [63] . 
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Applying filtering inside a top tagger is a straightforward generalization. Starting with the HEPTopTagger 
algorithm described in Sec HE we include a filtering stage and reconstruct all jet masses from filtered subjets. 
The effect is the same as shown for the Higgs tagger in Fig. [Tl] both W and top mass peaks are sharpened 
and moved towards the input values. Unlike for the Higgs tagger, all mass windows are part of the tagging 
algorithm, so there are no side bands in the usual sense. Still, choosing different top masses for the MC 
simulation and inside the tagger we can check that the tagger does not sculpt the backgrounds [8]. 

As mentioned above, the two filtering parameters -Rmtcr ano - "filter have to be tuned including realistic 
simulations of pile-up and detector effects. We find that using the i?fiiter value from Eq.(27) and ngitcr = 5 
gives the best performance for the more intensively radiating hadronic top quark |64) . 



B. Trimming 

Trimming |65j targets very similar effects as filtering. In the first step we reconstruct a fat jet which will 
be heavily impacted by QCD radiation. Its subjets we re-combine with a higher resolution i?trim, defining a 
larger number of smaller subjets. These subjets can be separated into two categories: hard and soft. This 
discrimination is based on the transverse momentum, so hard subjets obey 

PTJ > /trim A tr i m , (28) 

where /trim is an adjustable parameter and A tr i m is an intrinsic scale of the fat jet. It can for example be 
chosen as its jet mass or its transverse momentum. While we discard all soft subjets the re-combined hard 
subjets define a trimmed (fat) jet. Just like filtering this reduces the effective size of the fat jet entering any 
kind of jet mass measurement. 

Because A tr i m can be different for each fat jet the trimming procedure is self-adaptive: for a fat jet with 
large transverse momentum and/or mass the subjets need to have a larger transverse momentum to stay 
inside the trimmed jet. Just as the filtering procedure, trimming requires two input parameters. However, 
because it is self-adaptive the results are less sensitive to the origin of the fat jet and trimming can be used 
as a generic tagging tool in a multi-jet environment. 



C. Pruning 

Unlike filtering or trimming, pruning removes underlying event and pile-up while building the jet, i.e. as 
part of the jet algorithm. In a first step it defines a fat jet which can be based on any of the recombination 
algorithms described in Sec|TT] In a second step its constituents are pruned by checking in every recombination 
step 

miiipT.j i . . 

^ ^prune ailCl L\Tlj 1 j 2 > -/l prunc . \^) 



Pt. 



If both conditions are met, the merging — > j is vetoed. Just as filtering and trimming, pruning depends 
on two parameters: z prunc and i? prU no- A global value for z pruno ensures that recombined well separated 
subjets are not very asymmetric in px- In contrast, i? prU nc can be determined on a jet-by-jet basis. Thus, 
pruning is, similar to trimming, a self-adaptive procedure, applicable to a multi-jet final state in an unbiased 
resonance search. 



Consequently, for the C/A-algorithm all subjets are merged unless Eq.(29) holds and the minimal distance 
between the subjets is i? pr unc- Once the AR condition is true it automatically holds true for higher-level 
subsequent mergings, so only the z prim c condition needs to be considered. Thus, in trimming the transverse 
momentum of the subjets is a parameter which changes on a jet-to-jet basis, whereas in pruning the size of 
the subjets changes from jet to jet. 



Unlike filtering and trimming, pruning can be considered a full self-adaptive tagging algorithm, as discussed 
in Sec. II D [25, 33, 66 a . Fig. 12 shows how pruning extracts a sharp peak in the jet mass spectrum by 
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Figure 12: Jet mass with and without underlying event before (left) and after pruning (right), 
shown in the top row, background QCD jets on the bottom. Figures from Ref. [25] . 



Signal top jets are 



removing soft radiation in the recombination procedure. For QCD background events it moves the typical 
jet masses from values in the 30... 50 GeV range to the expected values of mj < 10 GeV, even in the presence 
of underlying event. 



D. Filtripruning 

In Sees. |III A[ |IIIB| and |III C| we describe three grooming techniques, focusing on their similarities and 
differences. At this stage they mainly act as tools to remove unwanted QCD activity from fat jets, which 
means they are indispensable tools for any substructure analysis. Filtering and trimming target a very similar 
problem with an equally similar approach. Both are mainly targeted at properly defining the objects for the 
tagging algorithms. Pruning works very differently in the sense that it can be used to actively contribute 
to the top tagging criteria; comparing Sees. II C| and |III C indeed shows that the pruning and top tagging 



algorithms are not easily separated. The general question therefore becomes at what level we can utilize 
observables inside the grooming procedure to improve top tagging. 



In Fig. 13 we show the performance of pruning, trimming and filtering on dijet and tt events for one 
transverse momentum slice of the fat jet, pTj = 500... 600 GeV [13]. It is based on reasonable, but not 
optimized parameters for trimming (i? sub = 0.35, / prU ne = 0.03, A tr im = Pr,j), pruning (z prunc = 0.1, i? prU nc = 
rrij/pTj), and filtering (i?fiitcr = 0.35, nater = 3). We see that the top mass is well reconstructed by all three 
grooming methods. The second peak appears because in some events the fat jet only captures the W decay 
subjets or the bottom quark is too soft to obey px,b > Pt,(ue,pu)- I n those cases pruned and trimmed subjets 
nicely reconstruct the W mass. Filtered subjets using nmter = 3 do not reconstruct the W boson because 
they are not adapted automatically. 

For the QCD background all algorithms reduce the QCD-jet mass by removing soft radiation as compared 
to the uncorrected values up to nij — 150 GeV. Again, this figure illustrates how crucial jet grooming is before 
we can in any way rely on a measured jet mass. In contrast to the signal reconstruction the three approaches 
perform differently. Pruning with its most sophisticated modelling of the QCD splittings best closest to the 
expected QCD range rrij < 10 GeV. For the chosen set of parameters trimming and in particular filtering 
perform much worse. While pruning internally vetoes mergings which are wide angle and asymmetric in 
Pt filtering always recombines three subjets irrespective of their px and relative position inside the jet. 
QCD radiation is enhanced in the soft and collinear limits, but the jet mass is most sensitive to wide angle 
radiation. In the detector a QCD jet will on average only show few isolated spots of energy accompanied by 
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Figure 13: Jet mass for tt events (left) and QCD backgrounds (right) for all three jet grooming techniques. The 
solid curve (red) represents the uncorrected anti-fer jet mass for R = 1.0 and 500 < pr,j < 600 GeV. Figures from 
Ref. [13]. 



more continuous radiation from pile-up and underlying event. Thus, with the fix choice of 3 subjets of size 
R=0.35 filtering is revoked of most of its ability to reject the dijet background. 



In Ref. [34 it was first shown that because grooming techniques treat the QCD effects in particular in 
background events differently they can in combination improve the tagging algorithm. Following the BDRS 
approach, see Sec.[ilA| a C/A fat jet of size R = 1.2 is selected with ptj > 200 GeV and \y\ < 2.5. After 
drop 



passing the mass drop requirement of Eq.(|6]) the jet is either filtered, trimmed, or pruned. All grooming 
methods reconstruct identical jet masses for the signal events, as we see in the left panel of Fig. [14} However, 
while trimming and filtering perform similarly on the Z+jet background, pruning tends to reconstruct larger 
jet masses than filtering or trimming, as shown in the right panel of Fig. |14| Thus, counting the number 
of events in the two-dimensional jet mass region around the input Higgs mass (115 ± 5) GeV S/B can be 
improved by up to a factor of two 34J. The same approach can be used in to reconstruct Z bosons in 
H — > ZZ decays [571 HE]- 111 a similar spirit many different jet substructure observables can be combined to 
tag boosted W bosons |69j . 

Applying different grooming techniques also significantly improves top tagging, for example using filtered 
as well as pruned jet masses in the HEPTopTagger framework [35]. After the mass drop identification all 
possible three-subjet combinations with filtered jet masses of m 



j j 3 



160.. .200 GeV are kept. In this step 
filtering can simply be replaced by pruning, such that eventually the filtered and pruned mass values can be 
compared. Without any optimization of the grooming parameters an improvement of S/B by a factor two 
seems realistic. A generalization to other taggers is straightforward. 
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Figure 14: Groomed jet masses in HZ (left) and Z+jets (right) events. For example, pruning is compared to filtering. 
All three algorithms reconstruct the Higgs mass equally well, while filtered or trimmed QCD-jet masses are smaller 
than pruned jet masses. Figures from Ref. [34] . 
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IV. PERFORMANCE 

Above all, top taggers are tools to reconstruct top quarks as precisely and unambiguously as possible. 
They are designed to support searches for new physics in otherwise inaccessible final states and phase space 
regions. Because of the many sources of radiation in a hadronic final states we cannot infer their performance 
from data driven methods only. Instead, to evaluate the reliability of taggers in a hadronic environment we 
need to also estimate their tagging efficiencies, background fake rates and top-momentum reconstruction 
using Monte Carlo programs. However, it is difficult to reproduce the full complexity of high-p^ events at 
the LHC using Monte Carlo programs. Therefore, after estimating tagging efficiencies using event generators 
we have to validated the predictions on data. 

Although the concepts of the taggers are quite different, all of them make use of information contained in 
the structure of the fat jet. Therefore, all taggers are subject to the question to what extent we can make 
use of this information at the LHC. Thus, already by studying some of the taggers in detail the experimental 
collaborations can obtain insights valuable for all taggers and jet substructure methods. 



Consequently, the interplay between the evaluation of top taggers on Monte Carlo samples (Sec. IV A I, 



the application of taggers on data (Sec. IV B) and the identification of interesting new physics scenarios 



(Sec. IV C) is of great importance for the optimal use of top taggers at the LHC. 



A. Comparisons 

Most of the top taggers described in Sec. [ll]are publicly available and included in collections of jet tools [TH 
I70j . Due to the different approaches in the reconstruction of the top quark it can be instructive to study how 
they perform on simulated event samples. For the BOOST 2010 proceedings Herwig [7T] and Pythia [73] 
signal and background samples have been prepared for such an analysis [73] . All samples are divided in 
equally-sized sub-samples with parton pr ranges from 200 — 300 GeV, 300 — 400 GeV, ... , 1.5 — 1.6 TeV, 
thus covering the full range from topologies with moderate boost to extremely energetic events. For each 
Pt bin 10.000 events were generated. Combining all samples yields an approximately flat px distribution. 
These samples were used to compare a subset of the taggers described in Sec. [ll] in [TJ]. For each event, jets 
were clustered with the anti-fc^ algorithm with an R-parameter of 1.0. In this study, for each top-tagging 
algorithm, the input parameters were optimized for each efficiency by minimizing the mistag rate. Because 
the top quark decays in a three body decay and the finite jet size can result in loosing one of the decay 
products different definitions of tagging efficiency come to mind, e.g. the correct reconstruction of the top 
quark mass or the reconstruction of the top quark momentum. In Ref. |13j the tagging efficiency and mistag 
rate was defined as the number of top-tags divided by the total number of anti-fcr jets in the background 
and signal sample, respectively. 

All tagging efficiencies are flat, particularly for the Thaler/Wang and ATLAS top tagger. Both have for 
small efficiency working points a large fake rate, but have a comparably small fake rate at working points 
with large tagging efficiencies. The CMS and Hopkins top tagger are very similar in design. Unsurprisingly, 
they perform very similarly over the whole efficiency range. 

For the BOOST 2011 proceedings [H] the list of taggers and the samples were extended. New Her- 
wig++ [74] and Sherpa [75] samples were generated with 100,000 unweighted events for each pr slice. The 
Sherpa samples allow to examine the effects of higher-order matrix elements [75]. The list of taggers was 
extended by a trimming-based tagger, N-subjettiness and the HEP tagger. The input parameters for the 
taggers were optimized for the individual efficiency working points. To make the study as realistic as possible 
a simple calorimeter simulation was included |77| . This simulation smears energy according to a radial profile 
based on performance of the ATLAS detector. 

In Fig. [15] (right) we include results for matched Sherpa samples with unoptimized input. Comparing 



the results of the unmatched Herwig samples of Fig. 15 (left) with the matched Sherpa samples (right), the 



efficiencies of all taggers decline when applied to matched samples. In both comparisons the Thaler/Wang 
and ATLAS tagger perform slightly worse than the other taggers. The mass drop and jet shape based taggers 
however perform very similarly. 

Only further detailed studies on data can provide a final assessment which tagger performs favorable 
compared to others. Most likely tagging performance is subject to the fat jet cone size, the transverse 
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Figure 15: Mistag rate versus efficiency after optimization for the studied top-taggers. Left: tagging rates averaging 
over all Herwig generated pr subsamples of Ref. [73]. Right: tagging rates for Sherpa generated CKKW samples with 
Pt 500-600 GeV, as described in Ref. [14]. It includes a simple detector simulation [77] ■ Figures from Refs. [13] and 

El- 



momentum of the top and the amount of additional uncorrelated radiation in the event. Thus, it is possible 
that not one tagger alone will be able to cover the whole spectrum of final states in an optimal way. 



B. Data 



Early jet substructure measurements are already available from HERA [78 -80 . The ZEUS collaboration 
studied the average number of subjets in ep collisions and the distribution of energy flow within jets. These 
measurements are well described by QCD and present Monte Carlo simulations. 

At the Tevatron both experiments, DO and CDF, measured the substructure of QCD jets. DO studied the 
fcr-subjet multiplicity for central jets of size R = 0.5 and pr — 55, ...100 GeV at y/s = 0.63 and 1.8 TeV. It 
turns out that the average number of subjets in jets which can be linked to hard gluons is significantly larger 
than in quark-related jets, as expected from the color charge and the form of the splitting kernels 8 Ij . It is 
worth noting, however, that any link of an observed jet to hard quarks or gluons is somewhat at odds with 
our QCD splitting picture and at least scale dependent. 

Based on 0.17 fb _1 of Run II data CDF measured jet shapes in inclusive jet production and compared 
them with different underlying event tunes [52]. Using 5.95 fb _1 of data CDF was the first experiment to 
look at substructure of massive jets with pt > 400 GeV [53]. Both Midpoint and anti-fey jet algorithms were 
used with jet sizes of R = 0.4, 0.7, and 1.0. The measured jet shapes are the jet mass, angularity (Eq.(18l), 
and planar flow (Eq.(17l). For the jet mass the theory predictions are in good agreement with the data. In 
contrast, for low angularity and planar flow the Pythia predictions show some level of disagreement. Using 
the same data sample CDF finds an upper cross section limit of 38 fb (at 95% CL) for boosted top quarks. 
This is approximately one order of magnitude higher than the estimated Standard Model rate, and is limited 
by the QCD background rates. However, it is the most stringent limit on boosted top quark production to 
date. 



Spurred by very positive Monte Carlo results for new physics searches [51] , ATLAS and CMS both initiated 
studies on jet substructure. An early ATLAS study on differential and integrated jet shapes [55] uses 3 pb _1 
of data to look at anti-fcr jet with pt = 30... 600 GeV. The data shows sensitivity to the details of the 
parton shower, fragmentation, and underlying event models in the Monte Carlo generators. However, for an 
appropriate choice of the parameters the agreement between theory and experiment is good. 

Pile-up is considered to be the biggest risk for the applicability of more elaborate jet substructure methods. 
As a test ATLAS measures the jet substructure including mass drops and filtering using 35 pb^ 1 of data at 
^/s = 7 TeV [5(5]. Jets are selected using the anti-kx algorithm with size R = 1.0 and the C/A-algorithm 
with R = 1.2. They are required to be central, \y\ < 2, and hard, pr,j > 300 GeV. The jet mass distributions 
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Figure 16: kr splitting scale (left), jet mass after mass drop and filtering (middle) and dependence of the jet mass 
on the number of primary vertices in the event (right). Figures from Ref. [86] . 



for both jet definitions are then compared with Monte Carlo predictions from Pythia and Herwig++. In 
addition, a set of substructure observables are included. For the anti-fey jets this is the first fcT-splitting scale 
i/rfi2 as defined in Eq.([2]); for C/A-jets it is the mass distribution after requiring a mass drop and filtering 
the subjets. 

The left panel of Fig. [16] shows that the measurement of the kj- splitting scale agrees within 10 — 20% 
of the Monte Carlo predictions over the whole analyzed range. We know that the un-groomed jet mass for 
large jets suffers from pile-up contributions. For this uncorrected jet mass different Monte Carlo predictions 
from Pythia and Herwig++ show sizable deviations from the data. However, after applying a mass drop 
and subjet filtering, as described in Sec. |II A[ the jet mass distributions agree well between theory and 
experiment, see Fig. [TBI 



Finally, the number of primary vertices is a measure for pile-up. The right panel of Fig. 16 shows the 
mean jet mass before and after splitting and filtering as a function of the number of primary vertices. 
After requiring an additional large angular subjet separation Rj lt j 2 > 0.3 the splitting naturally select more 
massive jets. The tilted lowest line shows the mean mass of jets which pass the splitting but before filtering. 



The filtering step then significantly reduces the impact of pile-up and the slope shown in Fig. 16 is consistent 
with zero. 

Already with the 2011 data CMS performed a search for heavy Z' and color octets decaying to a ti final 



state [HZ]. It relies on the CMS top tagger, a variant of the Johns Hopkins tagger introduced in Sec. II D 
All jets are reconstructed by the C/A-algorithm with R — 0.8. One set of events, called 'type 1+1' at least 
two jets with p T j > 350 GeV have to be present and allow for a top reconstruction. The 'type 1+2' set 
requires a leading jet with px,j > 350 GeV, a second with px,j > 200 GeV, and the third ptj > 30 GeV. The 
hardest jet needs to correspond to a hadronic top whereas the other two are reconstructed using pruning. 
Combining both topologies, a Randall-Sundrum KK gluon with mass between 1.0 and 1.5 TeV is excluded 
at 95% CL, based on 886 pb _1 of integrated luminosity, see Fig. 
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C. Applications 

As outlined in Sec. |T] boosted tops arise in a plethora of physics scenarios. Thus, the application of top 
taggers has become increasingly popular in searches for new physics resonances or effects. 

The need to reconstruct top quarks in decays of very heavy resonances was the initial motivation to study 
boosted tops. If the s-channel resonance is heavy and both tops decay hadronically, the decay products are 
highly collimated. To overcome the large dijet background it is unavoidable to study the internal structure 
of the jets. Because the tagging efficiency of the tops does not depend on the spin or the color charge of the 
heavy resonance, the only parameters which determine if the resonance search is feasible are the resonance's 
mass and the production cross section. The scenario of a heavy s-channel resonance decaying to two top 
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quarks has been discussed in context of various Z' and color octet models models [f>31 I58"H9"2] . 

Alternatively, top partners I93H96] , fourth generation [97] or vector quarks [98] can be sources of boosted 
top quarks. Pair-produced top partners or heavy quarks can decay into tops. In these small cascade decays 
the tops are often accompanied by missing transverse energy, Higgs bosons or gauge bosons, all forming 
useful handles to suppress backgrounds. 

Recently, CDF [99J and DO [100] measured an unexpectedly large forward-backward asymmetry of the 
top quarks. Measurements of this quantity are subtle at the LHC, due to its proton-proton initial state. 
However, one can define a forward central charge asymmetry which captures the physics. Unfortunately, for 
the dominating gg initial state at the LHC there is no asymmetry at all. To enhance the subdominant qq 
and qg production processes it is beneficial to require a large invariant mass of the ti system, i.e. require 
boosted tops. By reconstructing the momentum of the hadronic top and measuring the charge of the second 
top's lepton, it is possible to count the number of tops and anti-tops in the forward region. This allows to 
measure the forward-charge asymmetry at the LHC |101j . 



V. BEYOND HADRONIC TOPS 



Boosted resonances are a natural product of any ultraviolet completion of the Standard Model with new 
decaying particles around the TeV scale. In addition, even for continuum production processes the LHC will 
provide enough energy to probe a boosted regime if background suppression requires it. This means that 
the reconstruction of boosted heavy objects will be useful in a wide range of LHC searches. Particles which 
can be tagged using their hadronic decay channels include W and Z bosons, a light Higgs boson, or the 
top quark. Search strategies which directly use jet substructure in the identification and reconstruction of 
those Standard Model particles can indeed be a superior way of discovering new physics. As an analogy we 
can remind ourselves that for many decades we have not considered the decay products of a B meson the 
appropriate analysis objects in high-p^ searches. 

Top quarks are predominantly produced in pairs at hadron colliders. As discussed in Sec. [I] non-boosted 
top quarks decaying to blv provide three good handles for reducing QCD backgrounds: a charged lepton 
suitable for triggering, large missing transverse energy and a taggable b jet. Following the logic of hadronic 
top tagging we can ask if boosted leptonic top quarks have useful properties for LHC searches. 

Apart from heavy s-channel resonances, boosted top quarks naturally arise in decays of top partners, e.g. 
supersymmetric stops. The large tt backgrounds make the reconstruction of such top partners challeng- 
ing [7]. Using a top tagger on the purely hadronic decays we know how to extract and reconstruct such top 
partners [5]. 

For the semi- leptonic sample we start with one hadronic tagged top quark. Next, we need to identify 
and approximately reconstruct the leptonically decaying top quark. First, QCD jets again pose a dangerous 
background, because the rejection through 6-tagging degrades at large boost and the lepton-6 isolation 
becomes marginal AQ\. To ameliorate this problem, we can introduce a tracker-level mini-isolation cut for 
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Figure 17: Observed and predicted tt invariant mass distribution in the '1+1' topology (left). Limits on the possible 
cross section times branching ratio of it resonances (right). Figure from Ref. |87] . 
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the lepton [T0"2"] . This results in a high background rejection rate. To reconstruct the leptonic top momentum 
without relying on the measured missing transverse momentum we define an appropriate coordinate system 
for the top decay products. In those coordinates the neutrino momentum component either in or orthogonal 
to the i-b decay plane can be approximately neglected [103] . With this assumption the W and top mass 
constraints are sufficient to reconstruct the neutrino and top momenta and in turn use the measured pV 
value for background rejection. This again allows us to extract top partner signals out of large top pair 
backgrounds [103] . 



As described in Sec. II A the first applications of jet substructure were the W reconstruction from a heavy 
Higgs decay [TT] or from WW — > WW scattering [53]. The latter can provide important insights into the 
nature of electroweak symmetry breaking, especially if no Higgs boson is found. Adding information on the 
polarization of the W bosons to this analysis should enhance its sensitivity |104j . 

If a heavy Higgs boson decays mostly into gauge bosons the branching ratio into the cleaner leptonic final 
state is either small (for Z bosons) or difficult to reconstruct (for W bosons). New substructure methods 
can help to disentangle the hadronically decaying gauge bosons from the QCD backgrounds, allowing for a 
reconstruction of the Higgs mass [f>71 EH] an d spin properties [SS] ■ 

In the busy environment of the LHC the identification of hadronic W and Z decays is hard and it can be 
beneficial to combine several observables to discriminate them from QCD backgrounds. In Ref. [59] a large 
number of observables is combined in a multivariate approach. For highly boosted W bosons this technique 
might be able to improve on the BDRS method significantly. 

As a third massive Standard Model particle a light Higgs boson with a hadronic decay H — > bb is a natural 
candidate for tagging. As a matter of fact, this is precisely the channel where the BDRS tagger re-started 
the broad tagging effort for the LHC. In the Standard Model the Higgs tagger can be applied to the WH 
and ZH production processes [351 11051 1106] or to tiH production [TU] . Higgs bosons as decay products of 
new particles can for example appear from heavy s-channel Z' decays |107l [108 or supersymmetric cascade 
decays pfl fTfMm] . 

Finally, new particles themselves could be tagged. Light weakly interacting states for example arise in 
extended Higgs sectors. A light additional CP-odd scalar which couples to the Higgs potential can become 
an intermediate step in a four-body Higgs decay. If this scalar is light enough it will be boosted even in the 
decays of a SM-like Higgs boson with m# < 200 GeV. Such a decay is a perfect scenario for tagging methods 
and essentially impossible to detect using standard methods [114H116] . 



VI. OUTLOOK 



In this paper we have given an overview of different top taggers and related aspects. Top tagging as one 
of the major applications of subjet methods has in the past years been a rapidly developing field. While the 
first paper on top tagging was written by Mike Seymour in 1994, the developments relevant to top taggers 
used at ATLAS and CMS only started around 2008. Since then, whole conferences on the topic have been 
organized, with proceeding collections effectively setting the standard in the field. As we are writing this 
review we are waiting for the first tagged top quarks being announced by ATLAS and CMS. Obviously, we 
will not be able to give an appropriate summary of top tagging at this point in time. 

On the other hand, top tagging tools have been developed and tested at an impressive level of sophistica- 
tion. We have found that 

• in practice, hadronic top quark identification, at least in the boosted regime, will be as easy as a &-tag 
or any other particle identification. 

• top taggers always include the jet mass as one jet shape observable, but two fundamentally different 
approaches rely on either the jet clustering history or additional jet shapes for the top tagging algorithm. 

• soft QCD, underlying event, and pile-up are a major challenge to top taggers, so eventually any top 
tagger will need to include an efficient way of removing these effects. 
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• comparing top taggers will tell us much more than which tool to use for the identification of hadronic 
top decays. It will be a first glimpse on what physics concepts to use for QCD studies at the LHC. 

All of these aspects have been developed as the first truly new analysis concept for the LHC experiments. 
There still are many open questions which can only be answered in close collaboration between theory and 
experiment and which will make the coming years exciting times in top physics: 

• Can we actually tag tops in Standard Model events? 

• What are the efficiencies we can achieve, compared to Monte Carlo truth? 

• How well can we reconstruct the top momentum? 

• Most importantly: will we discover or measure anything new using top taggers? 

We are only now about to see the first comparisons with LHC data. Therefore, this is where our review of 
this rapidly moving field ends and the future begins. 
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